Overview

Dataset statistics

Number of variables15
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory530.9 KiB
Average record size in memory543.7 B

Variable types

NUM7
CAT5
BOOL3

Reproduction

Analysis started2020-06-21 09:14:03.105901
Analysis finished2020-06-21 09:14:17.046301
Duration13.94 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
TrainingTimesLastYear has 34 (3.4%) zeros Zeros
PropRoleComp has 169 (16.9%) zeros Zeros

Variables

Attrition
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
0
843
1
 
157
ValueCountFrequency (%) 
0 843 84.3%
 
1 157 15.7%
 

BusinessTravel
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Travel_Rarely
709
Travel_Frequently
199
Non-Travel
 
92
ValueCountFrequency (%) 
Travel_Rarely 709 70.9%
 
Travel_Frequently 199 19.9%
 
Non-Travel 92 9.2%
 

Length

Max length17
Mean length13.52
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 11 64.7%
 
Uppercase_Letter 4 23.5%
 
Dash_Punctuation 1 5.9%
 
Connector_Punctuation 1 5.9%
 
ValueCountFrequency (%) 
Latin 15 88.2%
 
Common 2 11.8%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

DistanceFromHome
Real number (ℝ≥0)

Distinct count29
Unique (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.145
Minimum1
Maximum29
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median7
Q313
95-th percentile26
Maximum29
Range28
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.120955912
Coefficient of variation (CV)0.8880214229
Kurtosis-0.1030362226
Mean9.145
Median Absolute Deviation (MAD)5
Skewness1.008336875
Sum9145
Variance65.94992492
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 143 14.3%
 
1 142 14.2%
 
9 65 6.5%
 
7 62 6.2%
 
10 58 5.8%
 
3 52 5.2%
 
8 52 5.2%
 
4 45 4.5%
 
5 44 4.4%
 
6 44 4.4%
 
Other values (19) 293 29.3%
 
ValueCountFrequency (%) 
1 142 14.2%
 
2 143 14.3%
 
3 52 5.2%
 
4 45 4.5%
 
5 44 4.4%
 
ValueCountFrequency (%) 
29 21 2.1%
 
28 17 1.7%
 
27 9 0.9%
 
26 18 1.8%
 
25 17 1.7%
 
Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
4
308
3
304
2
196
1
192
ValueCountFrequency (%) 
4 308 30.8%
 
3 304 30.4%
 
2 196 19.6%
 
1 192 19.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

JobRole
Categorical

Distinct count9
Unique (%)0.9%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Sales Executive
217
Research Scientist
209
Laboratory Technician
166
Manufacturing Director
97
Healthcare Representative
90
Other values (4)
221
ValueCountFrequency (%) 
Sales Executive 217 21.7%
 
Research Scientist 209 20.9%
 
Laboratory Technician 166 16.6%
 
Manufacturing Director 97 9.7%
 
Healthcare Representative 90 9.0%
 
Manager 74 7.4%
 
Sales Representative 64 6.4%
 
Research Director 47 4.7%
 
Human Resources 36 3.6%
 

Length

Max length25
Mean length18.024
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 20 69.0%
 
Uppercase_Letter 8 27.6%
 
Space_Separator 1 3.4%
 
ValueCountFrequency (%) 
Latin 28 96.6%
 
Common 1 3.4%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

JobSatisfaction
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
3
321
4
306
1
188
2
185
ValueCountFrequency (%) 
3 321 32.1%
 
4 306 30.6%
 
1 188 18.8%
 
2 185 18.5%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

MaritalStatus
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Married
469
Single
314
Divorced
217
ValueCountFrequency (%) 
Married 469 46.9%
 
Single 314 31.4%
 
Divorced 217 21.7%
 

Length

Max length8
Mean length6.903
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 11 78.6%
 
Uppercase_Letter 3 21.4%
 
ValueCountFrequency (%) 
Latin 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

OverTime
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
No
716
Yes
284
ValueCountFrequency (%) 
No 716 71.6%
 
Yes 284 28.4%
 

TrainingTimesLastYear
Real number (ℝ≥0)

ZEROS
Distinct count7
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.841
Minimum0
Maximum6
Zeros34
Zeros (%)3.4%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.300542352
Coefficient of variation (CV)0.4577762592
Kurtosis0.4313947547
Mean2.841
Median Absolute Deviation (MAD)1
Skewness0.5676822032
Sum2841
Variance1.69141041
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 362 36.2%
 
3 346 34.6%
 
5 89 8.9%
 
4 75 7.5%
 
6 48 4.8%
 
1 46 4.6%
 
0 34 3.4%
 
ValueCountFrequency (%) 
0 34 3.4%
 
1 46 4.6%
 
2 362 36.2%
 
3 346 34.6%
 
4 75 7.5%
 
ValueCountFrequency (%) 
6 48 4.8%
 
5 89 8.9%
 
4 75 7.5%
 
3 346 34.6%
 
2 362 36.2%
 

CommunicationSkill
Real number (ℝ≥0)

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.041
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.413972531
Coefficient of variation (CV)0.4649695926
Kurtosis-1.296248835
Mean3.041
Median Absolute Deviation (MAD)1
Skewness-0.0428380009
Sum3041
Variance1.999318318
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5 207 20.7%
 
4 206 20.6%
 
3 201 20.1%
 
2 193 19.3%
 
1 193 19.3%
 
ValueCountFrequency (%) 
1 193 19.3%
 
2 193 19.3%
 
3 201 20.1%
 
4 206 20.6%
 
5 207 20.7%
 
ValueCountFrequency (%) 
5 207 20.7%
 
4 206 20.6%
 
3 201 20.1%
 
2 193 19.3%
 
1 193 19.3%
 

OwnStocks
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
1
572
0
428
ValueCountFrequency (%) 
1 572 57.2%
 
0 428 42.8%
 

PropWorkLife
Real number (ℝ≥0)

Distinct count346
Unique (%)34.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.28684550215494325
Minimum0.0
Maximum0.6727272727272727
Zeros7
Zeros (%)0.7%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile0.04751082251
Q10.1794871795
median0.2631578947
Q30.4
95-th percentile0.5661672216
Maximum0.6727272727
Range0.6727272727
Interquartile range (IQR)0.2205128205

Descriptive statistics

Standard deviation0.1538261788
Coefficient of variation (CV)0.5362684012
Kurtosis-0.5050614235
Mean0.2868455022
Median Absolute Deviation (MAD)0.09465881685
Skewness0.43071975
Sum286.8455022
Variance0.0236624933
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3333333333 22 2.2%
 
0.2 21 2.1%
 
0.5 19 1.9%
 
0.25 18 1.8%
 
0.2222222222 17 1.7%
 
0.2857142857 16 1.6%
 
0.3076923077 13 1.3%
 
0.3225806452 13 1.3%
 
0.1666666667 12 1.2%
 
0.4 12 1.2%
 
Other values (336) 837 83.7%
 
ValueCountFrequency (%) 
0 7 0.7%
 
0.01960784314 1 0.1%
 
0.02222222222 1 0.1%
 
0.02631578947 1 0.1%
 
0.02857142857 6 0.6%
 
ValueCountFrequency (%) 
0.6727272727 2 0.2%
 
0.6666666667 1 0.1%
 
0.6607142857 1 0.1%
 
0.6603773585 1 0.1%
 
0.6551724138 1 0.1%
 

PropExpComp
Real number (ℝ≥0)

Distinct count154
Unique (%)15.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.143472619047619
Minimum0.0
Maximum38.0
Zeros7
Zeros (%)0.7%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile0.5
Q11.6
median3
Q35
95-th percentile10.5
Maximum38
Range38
Interquartile range (IQR)3.4

Descriptive statistics

Standard deviation4.063484827
Coefficient of variation (CV)0.9806954699
Kurtosis17.54727075
Mean4.143472619
Median Absolute Deviation (MAD)1.8
Skewness3.270087499
Sum4143.472619
Variance16.51190894
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5 81 8.1%
 
3 59 5.9%
 
0.5 57 5.7%
 
2 54 5.4%
 
1 47 4.7%
 
4 47 4.7%
 
2.5 42 4.2%
 
6 35 3.5%
 
10 29 2.9%
 
4.5 26 2.6%
 
Other values (144) 523 52.3%
 
ValueCountFrequency (%) 
0 7 0.7%
 
0.3 2 0.2%
 
0.375 1 0.1%
 
0.4 3 0.3%
 
0.4285714286 1 0.1%
 
ValueCountFrequency (%) 
38 1 0.1%
 
37 1 0.1%
 
34 2 0.2%
 
28 1 0.1%
 
23 1 0.1%
 

PropRoleComp
Real number (ℝ≥0)

ZEROS
Distinct count102
Unique (%)10.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.47968325589587163
Minimum0.0
Maximum0.875
Zeros169
Zeros (%)16.9%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.3333333333
median0.5
Q30.6666666667
95-th percentile0.8503571429
Maximum0.875
Range0.875
Interquartile range (IQR)0.3333333333

Descriptive statistics

Standard deviation0.2757928221
Coefficient of variation (CV)0.574947778
Kurtosis-0.8713633137
Mean0.4796832559
Median Absolute Deviation (MAD)0.1666666667
Skewness-0.5622562819
Sum479.6832559
Variance0.07606168074
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 169 16.9%
 
0.6666666667 139 13.9%
 
0.5 135 13.5%
 
0.3333333333 55 5.5%
 
0.875 46 4.6%
 
0.6 46 4.6%
 
0.7777777778 38 3.8%
 
0.4 34 3.4%
 
0.7 25 2.5%
 
0.8181818182 21 2.1%
 
Other values (92) 292 29.2%
 
ValueCountFrequency (%) 
0 169 16.9%
 
0.0625 1 0.1%
 
0.06666666667 1 0.1%
 
0.08695652174 1 0.1%
 
0.1071428571 1 0.1%
 
ValueCountFrequency (%) 
0.875 46 4.6%
 
0.8666666667 1 0.1%
 
0.8571428571 3 0.3%
 
0.85 1 0.1%
 
0.8461538462 3 0.3%
 

PropEducationAgeWy
Real number (ℝ≥0)

Distinct count190
Unique (%)19.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06524792881469622
Minimum0.011235955056179775
Maximum0.16666666666666666
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum0.01123595506
5-th percentile0.0249695122
Q10.04347826087
median0.06138147567
Q30.08510638298
95-th percentile0.1142857143
Maximum0.1666666667
Range0.1554307116
Interquartile range (IQR)0.04162812211

Descriptive statistics

Standard deviation0.02851788693
Coefficient of variation (CV)0.437069612
Kurtosis0.1831752779
Mean0.06524792881
Median Absolute Deviation (MAD)0.019714809
Skewness0.5724322994
Sum65.24792881
Variance0.0008132698751
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.09090909091 27 2.7%
 
0.07142857143 25 2.5%
 
0.1 22 2.2%
 
0.05882352941 21 2.1%
 
0.0625 21 2.1%
 
0.05 20 2.0%
 
0.05555555556 19 1.9%
 
0.08333333333 18 1.8%
 
0.07692307692 18 1.8%
 
0.06666666667 17 1.7%
 
Other values (180) 792 79.2%
 
ValueCountFrequency (%) 
0.01123595506 1 0.1%
 
0.01265822785 2 0.2%
 
0.01282051282 1 0.1%
 
0.01298701299 1 0.1%
 
0.01351351351 1 0.1%
 
ValueCountFrequency (%) 
0.1666666667 4 0.4%
 
0.1515151515 2 0.2%
 
0.15 3 0.3%
 
0.1428571429 5 0.5%
 
0.1379310345 1 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

AttritionBusinessTravelDistanceFromHomeEnvironmentSatisfactionJobRoleJobSatisfactionMaritalStatusOverTimeTrainingTimesLastYearCommunicationSkillOwnStocksPropWorkLifePropExpCompPropRoleCompPropEducationAgeWy
00Non-Travel23Laboratory Technician4SingleNo2400.40000012.0000000.5833330.071429
10Travel_Rarely123Manufacturing Director3MarriedYes2210.1944440.7000000.5000000.093023
21Travel_Rarely23Sales Executive4SingleNo3500.2181822.4000000.7000000.014925
30Travel_Rarely241Research Scientist4SingleNo2400.4615382.2500000.8750000.017544
40Travel_Rarely33Manufacturing Director3MarriedNo2110.2702705.0000000.6363640.063830
50Travel_Rarely72Sales Representative3MarriedNo2200.4193553.2500000.8750000.090909
61Travel_Rarely14Laboratory Technician3SingleYes2100.1250004.0000000.5000000.083333
70Travel_Rarely41Laboratory Technician2MarriedNo5500.2424240.8888890.6666670.097561
80Travel_Frequently114Sales Executive4DivorcedNo3410.1428572.5000000.3333330.050000
91Travel_Rarely72Sales Representative2SingleNo3500.0476190.5000000.0000000.045455

Last rows

AttritionBusinessTravelDistanceFromHomeEnvironmentSatisfactionJobRoleJobSatisfactionMaritalStatusOverTimeTrainingTimesLastYearCommunicationSkillOwnStocksPropWorkLifePropExpCompPropRoleCompPropEducationAgeWy
9900Travel_Frequently61Laboratory Technician1MarriedYes3510.2500002.2500000.5000000.088889
9910Travel_Frequently104Research Scientist3DivorcedYes4110.1470592.5000000.5000000.102564
9920Travel_Rarely13Healthcare Representative4DivorcedNo5310.48717919.0000000.5263160.051724
9931Travel_Frequently93Sales Executive3MarriedNo3300.5319153.1250000.2083330.041667
9940Travel_Rarely73Healthcare Representative1MarriedNo1110.2325581.1111110.0000000.056604
9950Non-Travel102Sales Executive4SingleNo3400.2777785.0000000.2727270.086957
9960Travel_Rarely163Manufacturing Director4SingleYes2200.4500002.5714290.4000000.051724
9971Travel_Rarely93Sales Executive4SingleNo3400.1956524.5000000.8000000.036364
9980Travel_Rarely23Manufacturing Director4SingleYes4500.4000002.0000000.0000000.071429
9990Travel_Frequently23Sales Executive2MarriedNo2310.56603810.0000000.4375000.036145